Wald (Inverse Gaussian) distribution (wald)#

The Wald distribution is a classic continuous model for positive, right-skewed waiting times. It is most famous as the first-passage time distribution of a Brownian motion with drift (a core story in sequential analysis and drift–diffusion models).

In SciPy, scipy.stats.wald is the standardized Wald (a special case of the inverse Gaussian). In the literature, “Wald distribution” is often used interchangeably with the inverse Gaussian distribution.


1) Title & classification#

Item

Value

Name

Wald / inverse Gaussian (wald)

Type

Continuous

Support (standard)

\(x \in (0,\infty)\)

Parameter space (inverse Gaussian form)

\((\mu,\lambda) \in (0,\infty)\times(0,\infty)\)

SciPy wald

standardized case \(\mu=1,\ \lambda=1\) with optional loc\in\mathbb{R}, scale>0

What you’ll be able to do after this notebook#

  • recognize when a Wald / inverse Gaussian model makes sense (especially first-passage times)

  • write the PDF/CDF and compute key moments

  • interpret how \((\mu,\lambda)\) change shape (and how SciPy parameterizes them)

  • derive mean/variance (via the MGF) and write down the log-likelihood

  • sample from an inverse Gaussian using NumPy only

  • use scipy.stats.wald and scipy.stats.invgauss for simulation and fitting

Notebook roadmap#

  1. Title & classification

  2. Intuition & motivation

  3. Formal definition (PDF/CDF)

  4. Moments & properties

  5. Parameter interpretation

  6. Derivations (\(\mathbb{E}[X]\), \(\mathrm{Var}(X)\), likelihood)

  7. Sampling & simulation (NumPy-only)

  8. Visualization (PDF, CDF, Monte Carlo)

  9. SciPy integration (scipy.stats.wald, scipy.stats.invgauss)

  10. Statistical use cases

  11. Pitfalls

  12. Summary

import math

import numpy as np
import scipy
from scipy import stats
from scipy.special import log_ndtr

import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

SEED = 123
rng = np.random.default_rng(SEED)

np.set_printoptions(precision=4, suppress=True)

print("numpy ", np.__version__)
print("scipy ", scipy.__version__)
print("plotly", plotly.__version__)
numpy  1.26.2
scipy  1.15.0
plotly 6.5.2

Prerequisites & notation#

Prerequisites

  • comfort with calculus (differentiation, basic integrals)

  • probability basics (PDF/CDF, expectation, likelihood)

  • optional: familiarity with Brownian motion / diffusion models

Notation (inverse Gaussian parameterization)

We use parameters \((\mu,\lambda)\) with

  • \(\mu>0\) = mean parameter

  • \(\lambda>0\) = shape (sometimes called “scale” in older sources)

and write

\[X \sim \mathrm{IG}(\mu,\lambda).\]

We also use \(\Phi(\cdot)\) for the standard normal CDF.

SciPy mapping

  • scipy.stats.wald is the standardized case (no shape parameters): $\(f(x)=\frac{1}{\sqrt{2\pi x^3}}\exp\!\left(-\frac{(x-1)^2}{2x}\right),\quad x>0.\)$

  • For the general inverse Gaussian \(\mathrm{IG}(\mu,\lambda)\), SciPy recommends scipy.stats.invgauss with invgauss(mu=mu/lam, scale=lam, loc=0).

2) Intuition & motivation#

What it models#

The Wald / inverse Gaussian distribution is a model for a positive time to reach a threshold when progress is noisy but has a consistent drift.

A canonical story:

  • Let \(B_t\) be standard Brownian motion.

  • Consider a drift–diffusion process $\(X_t = \nu t + \sigma B_t,\)\( with drift \)\nu>0\( and diffusion scale \)\sigma>0$.

  • Define the first-passage time to level \(a>0\): $\(T = \inf\{t\ge 0 : X_t = a\}.\)$

Then \(T\) has an inverse Gaussian distribution:

\[T \sim \mathrm{IG}\left(\mu=\frac{a}{\nu},\ \lambda=\frac{a^2}{\sigma^2}\right).\]

So:

  • larger drift \(\nu\) makes the threshold reached sooner (smaller \(\mu\))

  • larger noise \(\sigma\) makes the time more variable (smaller \(\lambda\))

Typical real-world use cases#

  • Response times in cognitive models (drift–diffusion / sequential probability ratio tests)

  • Time-to-failure / fatigue-life models (some engineering contexts)

  • Queueing / transport: positive travel times with asymmetric right tails

  • Finance: first-hitting times of drifted diffusions

Relations to other distributions#

  • The Wald is a special case of the inverse Gaussian (invgauss in SciPy).

  • For large \(\lambda\) (small relative variance), \(\mathrm{IG}(\mu,\lambda)\) is approximately Normal: $\(X \approx \mathcal{N}\!\left(\mu,\ \frac{\mu^3}{\lambda}\right).\)$

  • The inverse Gaussian is used as a mixing distribution in normal variance–mean mixtures, yielding the normal-inverse-Gaussian family.

3) Formal definition#

PDF (inverse Gaussian form)#

For \(\mu>0\), \(\lambda>0\), the inverse Gaussian density is

\[f(x;\mu,\lambda) =\sqrt{\frac{\lambda}{2\pi x^3}}\, \exp\!\left(-\frac{\lambda(x-\mu)^2}{2\mu^2 x}\right),\quad x>0.\]

Wald as a special case#

The standardized Wald distribution corresponds to \(\mu=1\), \(\lambda=1\):

\[f(x)=\frac{1}{\sqrt{2\pi x^3}}\exp\!\left(-\frac{(x-1)^2}{2x}\right),\quad x>0.\]

CDF#

Let \(\Phi\) denote the standard normal CDF. For \(x>0\),

\[F(x;\mu,\lambda) = \Phi\!\left(\sqrt{\frac{\lambda}{x}}\left(\frac{x}{\mu}-1\right)\right) + \exp\!\left(\frac{2\lambda}{\mu}\right) \Phi\!\left(-\sqrt{\frac{\lambda}{x}}\left(\frac{x}{\mu}+1\right)\right).\]
def invgauss_logpdf(x, mu, lam):
    """Log-PDF of IG(mu, lam) on x>0 (mu>0, lam>0)."""
    x = np.asarray(x, dtype=float)
    mu = float(mu)
    lam = float(lam)

    if mu <= 0 or lam <= 0:
        return np.full_like(x, -np.inf, dtype=float)

    out = np.full_like(x, -np.inf, dtype=float)
    mask = x > 0
    xm = x[mask]
    out[mask] = (
        0.5 * (np.log(lam) - np.log(2.0 * np.pi))
        - 1.5 * np.log(xm)
        - (lam * (xm - mu) ** 2) / (2.0 * mu**2 * xm)
    )
    return out


def invgauss_pdf(x, mu, lam):
    return np.exp(invgauss_logpdf(x, mu, lam))


def invgauss_cdf(x, mu, lam):
    """CDF of IG(mu, lam) using a numerically-stable log-space form."""
    x = np.asarray(x, dtype=float)
    mu = float(mu)
    lam = float(lam)

    out = np.zeros_like(x, dtype=float)
    if mu <= 0 or lam <= 0:
        out[:] = np.nan
        return out

    mask = x > 0
    xm = x[mask]
    z1 = np.sqrt(lam / xm) * (xm / mu - 1.0)
    z2 = -np.sqrt(lam / xm) * (xm / mu + 1.0)

    # F = Phi(z1) + exp(2*lam/mu) * Phi(z2)
    log_term1 = log_ndtr(z1)
    log_term2 = (2.0 * lam / mu) + log_ndtr(z2)
    out[mask] = np.exp(np.logaddexp(log_term1, log_term2))
    return out


def wald_pdf(x):
    return invgauss_pdf(x, mu=1.0, lam=1.0)


def wald_cdf(x):
    return invgauss_cdf(x, mu=1.0, lam=1.0)


# Quick checks vs SciPy
xs = np.linspace(0.05, 6.0, 7)
print("wald pdf match:", np.allclose(wald_pdf(xs), stats.wald.pdf(xs)))
print("wald cdf match:", np.allclose(wald_cdf(xs), stats.wald.cdf(xs)))

# General IG(mu, lam) via scipy.stats.invgauss(mu/lam, scale=lam)
mu_test, lam_test = 1.7, 4.2
rv_ig = stats.invgauss(mu_test / lam_test, scale=lam_test)
print("IG pdf match:", np.allclose(invgauss_pdf(xs, mu_test, lam_test), rv_ig.pdf(xs)))
print("IG cdf match:", np.allclose(invgauss_cdf(xs, mu_test, lam_test), rv_ig.cdf(xs)))
wald pdf match: True
wald cdf match: True
IG pdf match: True
IG cdf match: True

4) Moments & properties#

For \(X\sim\mathrm{IG}(\mu,\lambda)\):

Quantity

Value

Mean

\(\mathbb{E}[X] = \mu\)

Variance

\(\mathrm{Var}(X)=\dfrac{\mu^3}{\lambda}\)

Skewness

\(\gamma_1 = 3\sqrt{\dfrac{\mu}{\lambda}}\)

(Excess) kurtosis

\(\gamma_2 = 15\dfrac{\mu}{\lambda}\) (so kurtosis \(= 3+\gamma_2\))

A useful scale-free summary is the coefficient of variation:

\[\mathrm{CV} = \frac{\sqrt{\mathrm{Var}(X)}}{\mathbb{E}[X]} = \sqrt{\frac{\mu}{\lambda}}.\]

MGF and characteristic function#

For \(t < \lambda/(2\mu^2)\), the MGF is

\[M_X(t)=\mathbb{E}[e^{tX}] = \exp\!\left(\frac{\lambda}{\mu}\left(1-\sqrt{1-\frac{2\mu^2 t}{\lambda}}\right)\right).\]

The characteristic function follows by substituting \(t\mapsto it\).

Entropy#

The differential entropy is

\[h(X) = -\int_0^{\infty} f(x;\mu,\lambda)\,\log f(x;\mu,\lambda)\,dx,\]

which does not simplify to a short elementary closed form. In practice, compute it numerically (SciPy provides .entropy()), or estimate it by Monte Carlo via \(-\mathbb{E}[\log f(X)]\).

mu, lam = 1.5, 3.0

# Theory
mean_th = mu
var_th = mu**3 / lam
skew_th = 3.0 * math.sqrt(mu / lam)
exkurt_th = 15.0 * (mu / lam)

print("theory mean,var,skew,exkurt:", (mean_th, var_th, skew_th, exkurt_th))

# SciPy check (invgauss(mu/lam, scale=lam) corresponds to IG(mu, lam))
rv = stats.invgauss(mu / lam, scale=lam)
mean_sp, var_sp, skew_sp, exkurt_sp = rv.stats(moments="mvsk")
print("scipy  mean,var,skew,exkurt:", (float(mean_sp), float(var_sp), float(skew_sp), float(exkurt_sp)))

# Entropy: SciPy vs Monte Carlo estimate
h_scipy = float(rv.entropy())
samples = rv.rvs(size=80_000, random_state=rng)
h_mc = float(-np.mean(invgauss_logpdf(samples, mu, lam)))
print("entropy scipy:", h_scipy)
print("entropy MC   :", h_mc)
theory mean,var,skew,exkurt: (1.5, 1.125, 2.121320343559643, 7.5)
scipy  mean,var,skew,exkurt: (1.5, 1.125, 2.121320343559643, 7.5)
entropy scipy: 1.1683115761812808
entropy MC   : 1.1673168314822207

5) Parameter interpretation#

Meaning of parameters#

  • \(\mu\) sets the typical scale: it is the mean and (often) close to the mode when the distribution is concentrated.

  • \(\lambda\) controls concentration around \(\mu\): $\(\mathrm{Var}(X)=\frac{\mu^3}{\lambda} \quad\Rightarrow\quad \lambda\uparrow \;\Longrightarrow\; \text{smaller variance and less skew.}\)$

A helpful way to think about shape is via the ratio \(\mu/\lambda\):

  • relative spread: \(\mathrm{CV}=\sqrt{\mu/\lambda}\)

  • skewness: \(3\sqrt{\mu/\lambda}\)

Shape changes#

  • Fix \(\mu\) and increase \(\lambda\): the density becomes sharply peaked near \(\mu\) and looks increasingly Normal.

  • Fix \(\lambda\) and increase \(\mu\): the mean shifts right and dispersion grows quickly (since variance scales like \(\mu^3\)).

# PDF: varying mu (keep lambda fixed)
lam_fixed = 3.0
mus = [0.6, 1.0, 1.8, 3.0]

x_max = max(stats.invgauss(mu / lam_fixed, scale=lam_fixed).ppf(0.995) for mu in mus)
x = np.linspace(1e-6, float(x_max), 900)

fig = go.Figure()
for mu_i in mus:
    fig.add_trace(
        go.Scatter(
            x=x,
            y=invgauss_pdf(x, mu_i, lam_fixed),
            mode="lines",
            name=f"μ={mu_i}, λ={lam_fixed}",
        )
    )

fig.update_layout(
    title="Inverse Gaussian PDF: varying μ (λ fixed)",
    xaxis_title="x",
    yaxis_title="pdf",
)
fig.show()
# PDF: varying lambda (keep mu fixed)
mu_fixed = 1.2
lams = [0.5, 1.0, 3.0, 10.0]

x_max = max(stats.invgauss(mu_fixed / lam, scale=lam).ppf(0.995) for lam in lams)
x = np.linspace(1e-6, float(x_max), 900)

fig = go.Figure()
for lam_i in lams:
    fig.add_trace(
        go.Scatter(
            x=x,
            y=invgauss_pdf(x, mu_fixed, lam_i),
            mode="lines",
            name=f"μ={mu_fixed}, λ={lam_i}",
        )
    )

fig.update_layout(
    title="Inverse Gaussian PDF: varying λ (μ fixed)",
    xaxis_title="x",
    yaxis_title="pdf",
)
fig.show()

6) Derivations#

Expectation and variance (via cumulant generating function)#

Let \(K(t)=\log M_X(t)\) where

\[K(t)=\frac{\lambda}{\mu}\left(1-\sqrt{1-\frac{2\mu^2 t}{\lambda}}\right).\]

Cumulants are derivatives at \(t=0\):

  • \(K'(0)=\mathbb{E}[X]\)

  • \(K''(0)=\mathrm{Var}(X)\)

Differentiate:

\[K'(t) = \frac{\lambda}{\mu}\cdot \frac{\mu^2/\lambda}{\sqrt{1-\frac{2\mu^2 t}{\lambda}}} = \frac{\mu}{\sqrt{1-\frac{2\mu^2 t}{\lambda}}} \quad\Rightarrow\quad K'(0)=\mu.\]

Differentiate again:

\[K''(t)=\mu\cdot \frac{\mu^2/\lambda}{\left(1-\frac{2\mu^2 t}{\lambda}\right)^{3/2}} = \frac{\mu^3/\lambda}{\left(1-\frac{2\mu^2 t}{\lambda}\right)^{3/2}} \quad\Rightarrow\quad K''(0)=\frac{\mu^3}{\lambda}.\]

Likelihood#

For i.i.d. data \(x_1,\dots,x_n\) from \(\mathrm{IG}(\mu,\lambda)\), the log-likelihood is

\[\ell(\mu,\lambda) = \sum_{i=1}^n \log f(x_i;\mu,\lambda)\]

with

\[\log f(x;\mu,\lambda) = \tfrac12\log\lambda - \tfrac12\log(2\pi) - \tfrac32\log x - \frac{\lambda(x-\mu)^2}{2\mu^2 x}.\]

A well-known MLE result (for the inverse Gaussian form) is:

\[\hat\mu = \bar x,\qquad \hat\lambda = \frac{n}{\sum_{i=1}^n \frac{(x_i-\bar x)^2}{\bar x^2 x_i}}.\]
def invgauss_loglik(x, mu, lam):
    x = np.asarray(x, dtype=float)
    return float(np.sum(invgauss_logpdf(x, mu, lam)))


def invgauss_mle(x):
    x = np.asarray(x, dtype=float)
    if np.any(x <= 0):
        raise ValueError("All observations must be > 0 for IG(mu, lam).")
    n = x.size
    mu_hat = float(np.mean(x))
    denom = float(np.sum((x - mu_hat) ** 2 / (mu_hat**2 * x)))
    lam_hat = float(n / denom)
    return mu_hat, lam_hat


def invgauss_mle_lam_given_mu(x, mu_fixed):
    x = np.asarray(x, dtype=float)
    mu_fixed = float(mu_fixed)
    if mu_fixed <= 0:
        raise ValueError("mu_fixed must be > 0")
    if np.any(x <= 0):
        raise ValueError("All observations must be > 0")
    n = x.size
    return float(n * mu_fixed**2 / np.sum((x - mu_fixed) ** 2 / x))


# Demonstration on synthetic data
mu_true, lam_true = 1.7, 4.0
rv_true = stats.invgauss(mu_true / lam_true, scale=lam_true)
x = rv_true.rvs(size=3000, random_state=rng)

mu_hat, lam_hat = invgauss_mle(x)
print("true (mu, lam):", (mu_true, lam_true))
print("mle  (mu, lam):", (mu_hat, lam_hat))
print("loglik at true:", invgauss_loglik(x, mu_true, lam_true))
print("loglik at mle :", invgauss_loglik(x, mu_hat, lam_hat))
true (mu, lam): (1.7, 4.0)
mle  (mu, lam): (1.6979415068846022, 3.9342328565348255)
loglik at true: -3765.005001936604
loglik at mle : -3764.79254307707

7) Sampling & simulation#

NumPy-only sampler (Michael–Schucany–Haas)#

A standard exact algorithm to sample \(X\sim\mathrm{IG}(\mu,\lambda)\) uses only Normal and Uniform random variables:

  1. Draw \(V\sim\mathcal{N}(0,1)\) and set \(Y=V^2\).

  2. Form a candidate $\(X_1 = \mu + \frac{\mu^2 Y}{2\lambda} - \frac{\mu}{2\lambda}\sqrt{4\mu\lambda Y + \mu^2 Y^2}.\)$

  3. Draw \(U\sim\mathrm{Unif}(0,1)\).

  4. Return \(X=X_1\) if \(U\le \frac{\mu}{\mu+X_1}\), else return \(X=\frac{\mu^2}{X_1}\).

This produces exact inverse Gaussian samples and is fast and vectorizable.

def invgauss_rvs_numpy(mu, lam, size=1, rng=None):
    """Sample IG(mu, lam) using NumPy only (Michael–Schucany–Haas)."""
    if rng is None:
        rng = np.random.default_rng()

    mu = float(mu)
    lam = float(lam)
    if mu <= 0 or lam <= 0:
        raise ValueError("mu and lam must be > 0")

    v = rng.standard_normal(size)
    y = v * v

    mu2 = mu * mu
    x1 = mu + (mu2 * y) / (2.0 * lam) - (mu / (2.0 * lam)) * np.sqrt(4.0 * mu * lam * y + mu2 * y * y)

    u = rng.random(size)
    x = np.where(u <= (mu / (mu + x1)), x1, mu2 / x1)
    return x


# Quick validation of the sampler
mu, lam = 1.5, 3.0
n = 80_000
s = invgauss_rvs_numpy(mu, lam, size=n, rng=rng)

print("sample mean/var:", (float(np.mean(s)), float(np.var(s, ddof=0))))
print("theory  mean/var:", (mu, mu**3 / lam))
sample mean/var: (1.4978110241273188, 1.1084760209581126)
theory  mean/var: (1.5, 1.125)

8) Visualization#

We’ll visualize:

  • the PDF vs a histogram of Monte Carlo samples

  • the CDF vs the empirical CDF from samples

mu, lam = 1.3, 2.5
rv = stats.invgauss(mu / lam, scale=lam)

samples = invgauss_rvs_numpy(mu, lam, size=60_000, rng=rng)
x_max = float(rv.ppf(0.995))
x = np.linspace(1e-6, x_max, 900)

# PDF + histogram
fig = px.histogram(
    x=samples,
    nbins=120,
    histnorm="probability density",
    title="IG(μ, λ): histogram vs theoretical PDF",
    labels={"x": "x"},
)
fig.add_trace(go.Scatter(x=x, y=invgauss_pdf(x, mu, lam), mode="lines", name="theory pdf"))
fig.update_layout(bargap=0.02)
fig.show()

# CDF + empirical CDF
xs = np.sort(samples)
ecdf = np.arange(1, xs.size + 1) / xs.size

fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=invgauss_cdf(x, mu, lam), mode="lines", name="theory cdf"))
fig.add_trace(go.Scatter(x=xs[::50], y=ecdf[::50], mode="markers", name="empirical cdf", opacity=0.6))
fig.update_layout(
    title="IG(μ, λ): empirical CDF vs theoretical CDF",
    xaxis_title="x",
    yaxis_title="cdf",
)
fig.show()

9) SciPy integration#

scipy.stats.wald#

  • standardized Wald distribution (no shape parameters)

  • supports generic loc and scale transforms

scipy.stats.invgauss#

For the full inverse Gaussian \(\mathrm{IG}(\mu,\lambda)\), use:

rv = scipy.stats.invgauss(mu / lam, scale=lam, loc=0)

This matches the \((\mu,\lambda)\) parameterization used throughout this notebook.

# wald: pdf/cdf/rvs
x = np.linspace(1e-4, 6, 600)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=stats.wald.pdf(x), mode="lines", name="wald pdf"))
fig.update_layout(title="scipy.stats.wald PDF (standardized)", xaxis_title="x", yaxis_title="pdf")
fig.show()

s_wald = stats.wald.rvs(size=10_000, random_state=rng)
print("wald sample mean/var:", (float(np.mean(s_wald)), float(np.var(s_wald))))

# wald.fit fits loc/scale (note: loc/scale are generic, not part of the classic (mu, lam) parameterization)
loc_true, scale_true = 0.2, 1.5
data = stats.wald.rvs(loc=loc_true, scale=scale_true, size=5000, random_state=rng)
loc_hat, scale_hat = stats.wald.fit(data)
print("wald true loc,scale:", (loc_true, scale_true))
print("wald fit  loc,scale:", (float(loc_hat), float(scale_hat)))

# Inverse Gaussian (mu, lam) via invgauss
mu_true, lam_true = 1.7, 4.0
rv_ig = stats.invgauss(mu_true / lam_true, scale=lam_true)
data = rv_ig.rvs(size=5000, random_state=rng)

# Fit invgauss; fix loc=0 to match the (mu, lam) form.
mu_shape_hat, loc_hat, scale_hat = stats.invgauss.fit(data, floc=0)
mu_hat = float(mu_shape_hat * scale_hat)
lam_hat = float(scale_hat)
print("IG true (mu, lam):", (mu_true, lam_true))
print("IG fit  (mu, lam):", (mu_hat, lam_hat))
wald sample mean/var: (0.9850123489974157, 0.9642769362574072)
wald true loc,scale: (0.2, 1.5)
wald fit  loc,scale: (0.19128589639592256, 1.5316481216288031)
IG true (mu, lam): (1.7, 4.0)
IG fit  (mu, lam): (1.707057237926007, 4.055512548439454)

10) Statistical use cases#

A) Hypothesis testing (likelihood ratio test for \(\mu\))#

A common inferential question in first-passage-time models is whether the mean time \(\mu\) equals a value implied by a hypothesized drift.

We can test $\(H_0: \mu = \mu_0 \quad\text{vs}\quad H_1: \mu \ne \mu_0\)\( using the likelihood ratio statistic \)\(\Lambda = 2\left(\ell(\hat\mu,\hat\lambda) - \ell(\mu_0, \hat\lambda_{\mu_0})\right),\)\( which is approximately \)\chi^2_1\( for large \)n$.

B) Bayesian modeling#

Inverse Gaussian likelihoods are common for positive time data. Even without closed-form conjugacy, you can do simple grid-based Bayes or use modern samplers (PyMC/Stan) for hierarchical models.

C) Generative modeling#

The inverse Gaussian is a standard mixing distribution: if \(V\sim\mathrm{IG}\) and $\(Y\mid V\sim\mathcal{N}(\beta V,\ V),\)\( then \)Y$ has heavier tails than a Gaussian (this leads to the normal-inverse-Gaussian family).

# A) LRT example for mu
mu_true, lam_true = 1.4, 3.5
rv = stats.invgauss(mu_true / lam_true, scale=lam_true)
x = rv.rvs(size=3000, random_state=rng)

mu0 = 1.2

mu_hat, lam_hat = invgauss_mle(x)
lam_hat_mu0 = invgauss_mle_lam_given_mu(x, mu0)

ll_alt = invgauss_loglik(x, mu_hat, lam_hat)
ll_null = invgauss_loglik(x, mu0, lam_hat_mu0)

lrt = 2.0 * (ll_alt - ll_null)
p_value = float(stats.chi2.sf(lrt, df=1))

print("true (mu, lam):", (mu_true, lam_true))
print("H0 mu0:", mu0)
print("MLE (mu, lam):", (mu_hat, lam_hat))
print("LRT statistic:", float(lrt))
print("approx p-value:", p_value)
true (mu, lam): (1.4, 3.5)
H0 mu0: 1.2
MLE (mu, lam): (1.381122312541543, 3.4295590938454583)
LRT statistic: 165.08395022407058
approx p-value: 8.769428012935727e-38
# B) Simple grid Bayesian posterior for mu (treat lambda as known)
mu_true, lam_known = 1.6, 4.0
x = stats.invgauss(mu_true / lam_known, scale=lam_known).rvs(size=400, random_state=rng)

# Prior: log-normal on mu (mean roughly around 1.5)
prior = stats.lognorm(s=0.35, scale=np.exp(np.log(1.5)))

mu_grid = np.linspace(0.4, 3.2, 800)
log_prior = prior.logpdf(mu_grid)
log_like = np.array([invgauss_loglik(x, mu, lam_known) for mu in mu_grid])
log_post_unnorm = log_prior + log_like

# Normalize on the grid (log-sum-exp)
log_post = log_post_unnorm - scipy.special.logsumexp(log_post_unnorm)
post = np.exp(log_post)

post_mean = float(np.sum(mu_grid * post))
cdf_post = np.cumsum(post)
cdf_post /= cdf_post[-1]
ci_low = float(mu_grid[np.searchsorted(cdf_post, 0.025)])
ci_high = float(mu_grid[np.searchsorted(cdf_post, 0.975)])

print("true mu:", mu_true)
print("posterior mean:", post_mean)
print("95% credible interval:", (ci_low, ci_high))

fig = go.Figure()
fig.add_trace(go.Scatter(x=mu_grid, y=post, mode="lines", name="posterior"))
fig.add_vline(x=mu_true, line_dash="dash", line_color="black")
fig.update_layout(title="Posterior for μ (λ known)", xaxis_title="μ", yaxis_title="posterior density (grid)")
fig.show()
true mu: 1.6
posterior mean: 1.6233797987498713
95% credible interval: (1.5284105131414267, 1.7281602002503131)
# C) Generative modeling: normal mean–variance mixture with IG mixing
n = 80_000
mu_v, lam_v = 1.0, 1.5
beta = 0.6

v = invgauss_rvs_numpy(mu_v, lam_v, size=n, rng=rng)
z = rng.standard_normal(n)

# Y | V=v ~ Normal(beta*v, v)
y = beta * v + np.sqrt(v) * z

mean_y = float(np.mean(y))
std_y = float(np.std(y))
print("mixture sample mean/std:", (mean_y, std_y))

# Compare to a Gaussian with the same mean/std
x = np.linspace(mean_y - 5 * std_y, mean_y + 5 * std_y, 800)
fig = px.histogram(y, nbins=160, histnorm="probability density", title="IG mixing yields heavier tails than Gaussian")
fig.add_trace(go.Scatter(x=x, y=stats.norm.pdf(x, loc=mean_y, scale=std_y), mode="lines", name="matched Normal"))
fig.update_layout(bargap=0.02)
fig.show()
mixture sample mean/std: (0.6083134294865697, 1.1183377846342366)

11) Pitfalls#

  1. Parameterization confusion

  • The literature often uses \((\mu,\lambda)\) (mean + shape).

  • SciPy uses wald for a standardized special case, and invgauss(mu, scale) for the general case.

  • To represent \(\mathrm{IG}(\mu,\lambda)\) in SciPy, use invgauss(mu/lam, scale=lam, loc=0).

  1. Invalid parameters / support

  • Require \(\mu>0\), \(\lambda>0\), and data \(x_i>0\).

  • Using loc shifts the support to \((\mathrm{loc},\infty)\); fitting loc freely can yield surprising results.

  1. Numerical issues

  • For small \(x\) or large \(\lambda/\mu\), PDF/CDF computations can underflow/overflow.

  • Prefer logpdf, and for the CDF use stable forms (e.g. via log_ndtr as above).

  1. Goodness-of-fit p-values after fitting

  • If you estimate parameters from the same data, naive KS-test p-values are not calibrated (use bootstrap if you need a valid GOF p-value).

12) Summary#

  • The Wald / inverse Gaussian is a continuous distribution on \((0,\infty)\) with mean \(\mu\) and shape \(\lambda\).

  • It naturally models first-passage times of drifted Brownian motion: \(\mu=a/\nu\) and \(\lambda=a^2/\sigma^2\).

  • Key moments: \(\mathbb{E}[X]=\mu\), \(\mathrm{Var}(X)=\mu^3/\lambda\), skewness \(=3\sqrt{\mu/\lambda}\).

  • scipy.stats.wald is a standardized special case; for general \((\mu,\lambda)\) use scipy.stats.invgauss(mu/lam, scale=lam).

  • Exact sampling is easy with a fast NumPy-only algorithm (Michael–Schucany–Haas).